Goto

Collaborating Authors

 weakly-supervised reinforcement learning


Weakly-Supervised Reinforcement Learning for Controllable Behavior

Neural Information Processing Systems

Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. Can we instead constrain the space of tasks to those that are semantically meaningful? In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical chaff tasks. We show that this learned subspace enables efficient exploration and provides a representation that captures distance between states. On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.


Weakly-Supervised Reinforcement Learning for Controllable Behavior

Neural Information Processing Systems

Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. Can we instead constrain the space of tasks to those that are semantically meaningful? In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks. We show that this learned subspace enables efficient exploration and provides a representation that captures distance between states. On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.


Review for NeurIPS paper: Weakly-Supervised Reinforcement Learning for Controllable Behavior

Neural Information Processing Systems

Summary and Contributions: This paper proposes a framework for goal-conditioned RL with a goal representation whose structure is learned from weak human supervision. Most goal-conditioned RL methods either use the raw image as a goal, or an encoding learned with an unsupervised method such as a VAE. This paper takes as input a (relatively small) dataset of images, and asks human annotators to rank semantic attributes for pairs of image (which has higher lighting, which one has a door which is more open, etc). The algorithm operates in two phases: 1. Using the weak supervision signal from the human annotators, a disentangled representation is learning using a GAN-type loss on triplets of 2 images and one binary label.


Review for NeurIPS paper: Weakly-Supervised Reinforcement Learning for Controllable Behavior

Neural Information Processing Systems

The paper proposes a way to incorporate weak supervision, in the form of pairwise comparisons along various axes, into a goal-directed reinforcement learning framework, showing how this supervision can identify relevant latent factors for the construction of new tasks. The reviewers agree that this is a novel approach and makes an important step toward fully unsupervised approaches. As such, we are recommending acceptance.


Weakly-Supervised Reinforcement Learning for Controllable Behavior

Neural Information Processing Systems

Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. Can we instead constrain the space of tasks to those that are semantically meaningful? In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks. We show that this learned subspace enables efficient exploration and provides a representation that captures distance between states. On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.